Creation of Text Document Matrices and Visualization by Self-Organizing Map

نویسندگان

  • Pavel Stefanovic
  • Olga Kurasova
چکیده

In the paper, text mining and visualization by self-organizing map (SOM) are investigated. At first, textual information must be converted into numerical one. The results of text mining and visualization depend on the conversion. So, the influence of some control factors (the common word list and usage of the stemming algorithm) on text mining results, when a document dictionary is created, is investigated. A self-organizing map is used for text clustering and graphical representation (visualization). A comparative analysis is made where a dataset consists of scientific papers about the optimization, based on Pareto, simplex, and genetic algorithms. Two new measures are also proposed to estimate the SOM quality when the classified data are analyzed: distances between SOM cells, corresponding to data items assigned to the same class, and the distance between centers of SOM cells, corresponding to different classes. The quantization error is measured to estimate the SOM quality, too.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps

Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive twodimensional format. Document topics are inferred usin...

متن کامل

Information Visualization with Self-Organizing Maps

The Self-Organizing Map (SOM) is an unsupervised neural network algorithm that projects highdimensional data onto a two-dimensional map. The projection preserves the topology of the data so that similar data items will be mapped to nearby locations on the map. Despite the popular use of the algorithm for clustering and information visualisation, a system has been lacking that combines the fast ...

متن کامل

Exploration of Full-text Databases with Self-organizing Maps

Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on...

متن کامل

A New Evolving Tree for Text Document Clustering and Visualization

The Self-Organizing Map (SOM) is a popular neural network model for clustering and visualization problems. However, it suffers from two major limitations, viz., (1) it does not support online learning; and (2) the map size has to be pre-determined and this can potentially lead to many “trial-and-error” runs before arriving at an optimal map size. Thus, an evolving model, i.e., the Evolving Tree...

متن کامل

Text Mining with the WEBSOM

The emerging eld of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • ITC

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2014